Q-functionals for Value-Based Continuous Control

نویسندگان

چکیده

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value state-action pair, our network transforms state into function that can be rapidly evaluated in parallel many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical off-policy control, where policy is trained sole purpose selecting from Q-function. represent action-dependent Q-function as weighted sum basis functions (Fourier, Polynomial, etc) over action space, weights are state-dependent and output by Q-functional network. Fast sampling makes practical variety techniques require Monte-Carlo integration Q-functions, enables action-selection strategies besides simple value-maximization. characterize framework, describe various implementations demonstrate strong performance on suite tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Q-learning Based Continuous Tuning of Fuzzy Wall Tracking

A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...

متن کامل

Q-learning for Optimal Control of Continuous-time Systems

In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for continuous-time systems, policy iteration based QL (PIQL) and value iteration based QL (VIQL) algorithms are proposed for learning the optimal control policy fr...

متن کامل

Point-Based Value Iteration for Continuous POMDPs

We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value fu...

متن کامل

Q Memory based active learning for optimizing noisy continuous functions

This paper introduces a new algorithm Q for optimizing the expected output of a multi input noisy continuous function Q is de signed to need only a few experiments it avoids strong assumptions on the form of the function and it is autonomous in that it re quires little problem speci c tweaking These capabilities are directly applicable to industrial processes and may become in creasingly valuab...

متن کامل

Banach Algebra of Continuous Functionals and the Space of Real-Valued Continuous Functionals with Bounded Support

In this article, we give a definition of a functional space which is constructed from all continuous functions defined on a compact topological space. We prove that this functional space is a Banach algebra. Next, we give a definition of a function space which is constructed from all real-valued continuous functions with bounded support. We prove that this function space is a real normed space.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i7.26073